Apache Mahout Clustering Designs by Ashish Gupta

Apache Mahout Clustering Designs by Ashish Gupta

Author:Ashish Gupta [Gupta, Ashish]
Language: eng
Format: azw3, pdf
Publisher: Packt Publishing
Published: 2015-10-07T16:00:00+00:00


The notations used in this figure are described here:

M, N, and K represent the number of documents, the number of words in the document, and the number of topics in the document respectively.

α is the prior weight of the topic k in the document

β is the prior weight of the word w in a topic

φ is the probability of a word occurring in a topic

Θ is the topic of distribution

Z is the identity of topic of all words in all documents.

W is the identity of all the words in all the documents.

How does LDA work in the MapReduce mode? These are the steps that LDA follows in the mapper and reducer steps:

Mapper phase:

Program starts with an empty topic model

All the documents are read by different mappers

Probabilities are calculated of each topic for each word in the document



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.